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This paper gives a precise characterization of the fundamental limits of adaptive sensing for 
diverse estimation and testing problems concerning sparse signals. We consider in particular the 
setting introduced in Haupt, Castro and Nowak (2011) and show necessary conditions on the 
minimum signal magnitude for both detection and estimation: if a; G K" is a sparse vector with 
s non-zero components then it can be reliably detected in noise provided the magnitude of the 
non-zero components exceeds sjlj s. Furthermore, the signal support can be exactly identified 
provided the minimum magnitude exceeds yJ2 log s. Notably there is no dependence on n, the 
extrinsic signal dimension. These results show that the adaptive sensing methodologies proposed 
previously in the literature are essentially optimal, and cannot be substantially improved. In 
addition these results provide further insights on the limits of adaptive compressive sensing. 
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1. Introduction 

This paper addresses the characterization of the fundamental limits of adaptive sensing 
in sparse settings, when a potentially infinite number of observations is available but 
there is a restriction on the sensing precision budget available. One of the key aspects 
of adaptive sensing is that the data collection process is sequential and adaptive. In dif- 
ferent fields these sensing/experimenting paradigms are known by different names, such 
as sequential experimental design in statistics and economics (see Wald (1947); Bessler 
(1960); Fedorov (1972); El-Gamal (1991); Hall and Molchanov (2003); Lai and Robbins 
(1985); Blanchard and Geman (2005)), active learning or adaptive sensing /sampling in 
computer science, engineering and machine learning (see Cohn, Ghahramani and Jordan 
(1996); Freund et al. (1997); Novak (1996); Korostelcv and Kim (2000); Dasgupta (2004); 
Castro, Willett and Nowak (2005); Dasgupta, Kalai and Monteleoni (2005); Dasgupta 
(2005); Hanneke (2010); Koltchiinskii (2010); Balcan, Beygelzimer and Langford (2006); 
Castro and Nowak (2008)). 

The extra fiexibility of adaptive sensing can sometimes (but not always) yield signif- 
icant performance gains. In this paper we are particularly concerned with the setting 
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in Haupt, Castro and Nowak (2011), where the authors propose an adaptive sparse sig- 
nal recovery method that provably improves on the best possible non-adaptive sensing 
methods. However, in that work there is no indication on the fundamental performance 
limitations in such sensing scenarios. This paper addresses those breeches in our under- 
standing, and shows that the proposed procedures are essentially asymptotically opti- 
mal for estimation problems. Furthermore, with some modifications, the procedure of 
Haupt, Castro and Nowak (2011) is also nearly optimal when testing for the presence of 
a sparse signal. In addition, we also present results characterizing the fundamental limi- 
tations in several other settings, such as exact support recovery, as in Malloy and Nowak 
(2011b, a) or in Arias-Castro, Candcs and Davenport (2011). 

2. Problem Setting 

Let X S K." be an unknown vector. Wc assume this vector is sparse in the sense that only 
a reduced number of its entries are not-zero. In particular let S" be a subset of {1, . . . , n} 
and assume that for all z e {1, . . . ,n} such that i ^ S we have a;,; = 0. We refer to S as the 
signal support set and this is our main object of interest. In this paper we consider two 
distinct classes of problems: (i) signal support estimation, where we desire to estimate S; 
(ii) signal detection, where we simply want to test if S belongs to some particular class. 

In our model the signal x is unknown, but we can collect partial information through 
noisy observations. In particular we observe 

Yk^XA,+T~^Wk Vfce {1,2,...} , (2.1) 

where Ak,Tk are taken to be measurable functions of {Yi, Ai,ri}'l~l , and Wk arc stan- 
dard normal random variables, independent of {Yi}^~^ and also independent of , Ti}^^-^ 
In this model Ak € {1, . . . ,n} corresponds to the entry of x that is measured at time 
k, therefore Ak can be viewed as the sensing action taken at time k. Similarly is the 
precision of the measurement taken at time k. Finally, there is a total sensing budget 
constraint that must be satisfied, namely 

oo 

Y.^l<^, (2.2) 

fe=i 

where to > 0. It is important to note that we can consider both deterministic sequential 
designs or random sequential designs. In the latter we allow the choices Ak and Ffc to 
incorporate extraneous randomness, which is not explicitly described in the model. Be- 
sides being more general this extra flexibility often facilitates the analysis. The collection 
of conditional distributions of Ak^Tk given {Yi, Ai,Ti}^~^ for all k is referred to as the 
sensing strategy, and denoted by A. Note that, within the sensing model above, we can 
also consider non-adaptive sensing frameworks, meaning the choice of sensing actions and 
precision allocation must be made before collecting any data. Formally this means that 
{Afc,rfe}fcgN is statistically independent from {Yk}kefi- Note that a non-adaptive design 
can still be random. 
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The case m = n is of particular interest and this is often considered in hterature as 
it allows direct comparison between adaptive and non-adaptive sensing methodologies. 
If TO = rt we allow, on average, one unit of precision for each one of the n signal entries. 
Therefore if we assume the signal x belongs a class for which there is no reason to 
give a priori preference to any particular signal entry the optimal non-adaptive sensing 
strategy amounts to measuring each vector entry exactly once, with precision one^. This 
is obviously the classical normal means model. 

In the following sections we consider two different scenarios: signal detection/testing 
and signal estimation. In both cases the extra flexibility of adaptive sensing is shown to 
be extremely rewarding. We characterize the fundamental performance limits of adap- 
tive sensing in those scenarios and show that these limits can be achieved by practical 
inference methodologies. 



3. Signal Detection 

In this setting wc are interested in a binary hypothesis testing problem, where we test a 
simple null hypothesis against a composite alternative. In particular, the null hypothesis 
Hq is simply = 0, and the alternative hypothesis Hi is S € C, where C is some class of 
non-empty subsets of {1, . . . , n}. We are particularly interested in the case when under 
the alternative Hi all the sets in C have cardinality s, meaning that for all S* e C we have 
\S\ = s. We will consider only such classes, as this greatly simplifies the presentation and 
is not, for the most part, a restrictive condition. 
Define 

Xmin = min{|a;j| : ^ , i £ {1,. . . ,n}} . 

In the following we characterize the fundamental signal detection limits, in particular 
identifying conditions function of C and n, such that no procedure is able 

to reliably distinguish the two hypotheses. Furthermore these bounds are essentially 
tight, in the sense that there exist practical procedures matching them. For simplicity we 
consider only non-negative signals, meaning that Xi > for all i € {1, . . . , n}. This greatly 
simplifies the analysis, without hindering the generality of the results. More comments 
about this are issued in Remark 3.2. Furthermore the hardest signals to detect or estimate 
are of the form 

^.-(n ^ll^^- • (3-1) 
[ otherwise 

This means that we can restrict our analysis to signals of the form above, which are 
entirely described by the signal support set S and signal amplitude ^. This is also the 
class of signals considered in Addario-Berry et al. (2010) or in Donoho and Jin (2004) in 
a non-adaptive sensing context. 
Let 

D = {Y„A„T,},en , 



^Due to statistical sufficiency there is no gain in measuring each signal entry more than once. 
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and let d = {yi,ai,^i}^i be a particular realization of the experimental procedure. 
Let A denote a particular sensing strategy, and G {0,1} be an arbitrary testing 

function, taking the value 1 if the null hypothesis is to be rejected, and zero otherwise. 
For notational convenience we write simply $ where the hat indicates the dependency 
on the data D. The risk of this procedure is given by 

R{^) = P0(<l 7^ 0) + maxP5(4 ^ 1) , 

where P5 denotes the joint probability distribution of {y^, Ai,Ti]°^^ for a given value of 
S. Likewise we use E5 to denote expectation under Ps. 
Now define 

c(^,C) = inf = inf \ v%{^ ^ 0) + maxP5($ ^ l)\ . (3.2) 

i.A L sec J 

Our formal goal is to identify the values of the signal magnitude /i for which we have 
necessarily c(/i, C) > e for some e > 0. 

Remark 3.1. The choice of risk above is obviously not the only one possible, and in 
the literature other choices of risk have been considered, such as 

^($)=max|p0($7^O),maxPs($7^1)| , (3.3) 

or 

i?($)-P0(<I.^O) + i^^P5(<i>^l) . (3.4) 

sec 

As discussed in Addario-Berry et al. (2010), the latter measure of risk corresponds to the 
view that, under the alternative hypothesis, a set 5 S C is selected uniformly at random 
from C. Clearly 

i?($) < i?(4) < 2^(4) < 2i?($) . 

If there is sufhcient symmetry in C and $ these three risk measures are essentially iden- 
tical. Whenever possible we characterize the fundamental limits of adaptive sensing for 
each one of the risk measures, but focus primarily on i?($). 



3.1. Main Results - Detection 

The class C of all subsets of {1, . . . ,ri} with cardinality s is one of particular interest. 
This is the class of maximal size, and obviously the one for which we expect the worst 
performance for detection. Perhaps surprisingly, under the adaptive sensing paradigm, 
the exact same performance lower bound is obtained for any class C exhibiting some 
very mild symmetry. This means that, in many situations, the structure of the class 
C does not really help under the adaptive sensing scenario. This is in stark contrast 
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with non-adaptive sensing scenarios, where the structure of the set C can play a very 
prominent role, as well documented in Addario-Berry et al. (2010); Arias-Castro et al. 
(2008); Butucea and Ingster (2011). To state the main result of this section we need the 
following definitions: 

Definition 3.1 (symmetric class/full range). Let S = Usee '^^^ ^ drawn uni- 
formly at random from C. Lf for all i Cz 'E, we have P(i G S*) = the class C is said to 
be symmetric. Furthermore if |S| = n the class is said to be full range. 

It is remarkable that many classes C of interest satisfy this mild symmetry, as for 
instance, aU the classes in Addario-Berry et al. (2010). 

Theorem 3.1. Consider the setting above and let C be a symmetric class. Let ^{D) 
be an arbitrary testing procedure, where D = {y^, A^, r^jigN. Finally let < e < 1 be 
arbitrary. If R{^) < e then necessarily 

xmin > V — log;^ . (3.5) 
V sm 2e 

This result gives a condition on the minimal signal magnitude necessary to ensure the 
detection risk is not too large. Perhaps surprisingly the lower bound does not include 
any factor involving specific structural properties of C, but only the range and cardinality 
of the corresponding sets. A possible way to understand this comes from the following 
observation: for detection, it suffices to identify a single element of S, and there is no 
need to identify all the elements. Therefore cues provided by the structure are not very 
informative. Before proving this result it is interesting to present a simple corollary for 
the case of full range classes, emphasizing the asymptotic behavior. 

Corollary 3.1. Let C be a symmetric and full range class of sets with cardinality s, 
where s can be a function of n (this dependence is not explicitly stated). Let $„ be an 
arbitrary adaptive sensing testing procedure. If 

lim i?(<l„) = 

n— )-oo 

then necessarily 

*^min ^ 

where Un is a sequence for which lim„_>oo = oo. 

This corollary gives a necessary condition for detection consistency. As shown in Propo- 
sition 3.3 this bound is actually tight, meaning there are adaptive sensing procedures that 
can detect signals satisfying the above condition. The case to = n is particularly interest- 
ing, as it allows the comparison between adaptive and non-adaptive sensing performance. 




6 



Rui M. Castro 



For that case, adaptive sensing detection is possible if Xmin = ^n\J ^- Since a;„ can di- 
verge arbitrarily slowly we see that the extrinsic signal dimension n plays no significant 
role in this bound, and only the intrinsic dimension s is relevant. Keep in mind, however, 
that oJn is related to the rate of convergence of the risk i?(<I'„) to zero. Corollary 3.1 
is in stark contrast to what is known for the same problem if one restricts to the clas- 
sical setting of non-adaptive sensing, as in Ingster (1997); Ingster and Suslina (2003); 
Donoho and Jin (2004); Donoho (2006). For instance, for the class of all subsets with 
cardinality s it is necessary to have Xmin > c-\/Iogn if s < o{y/n), where the factor c > 
depends on the specific relation between s and n. In Meinshausen and Rice (2006) the 
authors considered estimation of the proportion of significant components |S'|/n. Their 
setting is more general, as the distributions corresponding to significant and insignifi- 
cant signal component observations can be non-normal. Their approach can be used to 
test the hypothesis IS*] =0. For the Gaussian case, they recover essentially the \/\ogn 
scaling. Finally, in Cai, Jin and Low (2007) the authors consider again the estimation of 
the fraction of significant signal components in the normal means case, and show results 
beyond consistency, including minimax rates of convergence of the risk. We now proceed 
with the proof of the theorem and a discussion about tightness of the bounds. 

Proof of Theorem 3.1. : The proof of this lower bound hinges, as usual, on the anal- 
ysis of likelihood ratios. Begin by defining the joint probability density function of 
{YkT Ak^Tk}^^i under 5, which we denote by 

f{d]S) = /(yi,ai,7i,y2,a2,72,---;5') ■ 

Note that this is properly defined for a certain dominating measure (mixed continuous 
and discrete). Taking into account the conditional dependences in our observation model 
we can factorize this probability density function as follows 

f{d]S) = /Ai,ri(ai,7i) X /n|Ai,ri(2/i|ai,7i;'5') 

x/A2,r2|yi,^i,ri(a2,72|yi,ai,7i) x /y^ 1^2X2(^2 1 02, 72; 5') x ••• 

Note that in this factorization only some terms involve the underlying true set while 
all the other terms depend solely on the sensing strategy used. This greatly simplifies the 
computation of likelihood ratios, as all the terms not involving S cancel out. In particular 
the likelihood ratio between two hypotheses is given simply by 

TT fYu\A^xSyk\ak,lk]S) 

f^i /n|Afc,rj2/fc|afe'7fe;'5") ' 

As usual, in order to effectively distinguish if the underlying true distribution is param- 
eterized by S or S' the corresponding likelihood ratio needs to be significantly different 
than 1. We proceed by formally stating this. Our analysis is heavily inspired by the 
approach in Chernoff (1959). 
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The first step is to relate the probabilities of type I and type II errors to the likelihood 
ratio, namely giving a relation between Ps($ 7^ 1) and P0($ 7^ 0) where S is an arbitrary 
element of C. Begin by defining the total variation and the KuUback-Leibler divergence 
between two probability measures. 

Definition 3.2. Let Pq and Pi be two probability measures defined on a common mea- 
surable space {fl,B). The total variation distance is defined as 

TV{Fo,Fi) = snp \Fo{B) -¥i{B)\ . 
BeB 

The Kullback-Leibler divergence is defined as 

[_ +00 otherwise 

The total variation is a proper distance, unlike the Kullback-Leibler divergence. Both 
are always non-negative but the latter is not symmetric. Note that, whenever Pq and Pi 
have a common dominating measure one can define the corresponding densities /o and 
/i, and the Kulback-Lciblcr divergence is simply given by 



KL(Po||Pi) 



log^ 
. ^ h{X) 



where X is a random variable with distribution given by Pq. Therefore wc get simply 
the expected value of a log-likelihood ratio. Consider now the setting in this paper. As 
done in Tsybakov (2009) the total variation is closely related to the infimum of the sum 
of type I and type II error probability, namely, for any binary (test) function <1> we have 

P0(4 7^ 0) + Ps($ 7^ 1) > 1 - TV(P0, Ps) . 

Evaluating the total variation distance is generally difhcult, but using Lemma 2.6 of 
Tsybakov (2009) we can relate it to the Kullback-Leibler divergence, which is generally 
much easier to evaluate. Namely 

1 -TV(P0,Ps) > icxp(-KL(P0i|Ps)) . 

Putting these two results together we obtain a simple relation between the Kullback- 
Leibler divergence and the probabilities of error. 



KL(P0||Ps)>-log(^2P0($76 0) + 2Ps($^l)j . (3.8) 
To simplify the notation let LRg ^/ = 1jYIs.S'{D). From Equation 3.8 we conclude that 
E0[logLR0,5] - KL(P0||Ps) > - log (2P0($ =^ 0) + 2P5(<I ^ 1)^ 
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Since the choice of set S was completely arbitrary, we have the bound 



minE0[logLR0^s] > min |- log (2P0(I> ^ 0) + 2¥s[ 



(3.9) 



At this point it is important to note that, if we desire to have -R($) < e for some 
< e < 1 then P0($ ^ 0) + Ps($ 7^ 1) < e (for any S gC), and therefore 



minE0[logLR0^5] > log ( 



See 



2e 



(3.10) 



The next step of the proof entails deriving a good upper bound on min5gcIE0[logLR0 5] 
and comparing it to the lower bound just shown. 

As noted before, the expected likelihood ratio is actually the Kullback-Lcibler diver- 
gence between P0 and Pg. This obviously depends on the sensing strategy A that is used. 
Therefore we need to get an upper bound on 



sup min E0 [log LR0.5] 
A Sec 



(3.11) 



It is instructive to compare the above expression with the one of the minimax error (3.2). 
Note that the roles of the max/sup and min/inf are reversed. This should not come as 
a surprise as larger values of E0[logLR0 5] correspond to lower probabilities of error. 
Note also that E0[logLR0 5] can be interpreted as the payoff matrix of a game where the 
sensing strategy makes the first move, and nature is the opponent that chooses a sparsity 
pattern in an adversarial way. Now note that 



E0[logLR0,s] = ^E0 



fc=i 



k=l 
00 

9 00 

^ V 
2 ^ 

fe=i 



log 



fn\A,,rAYk\Ak,Tk;ili) 
fniA,,r.(.Yk\Ak,rk;S) 

/n.|A..rJ>^fe|Afc,rfe;0) 



log 



fY,\A,,rAYk\Ak.Tk;S) 



E0 [i{Ak e S}rl] 



where the final steps rely simply on the KuUback-Leibler divergence between normal 
random variables with the same variance and different means. At this point we need to 
evaluate 

' ,2 



supmin<j^^E0 [l{A, e SjVl] 



k=l 



We need to solve the above optimization problem over the space of all possible sensing 
strategies. Although this might seem rather involved, this optimization can be reduced 
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to a much simpler deterministic optimization problem. Begin by defining 

oo 

6, = ^E0[l{A-=*}r^] . (3.12) 
fc=i 

Note that this definition does not depend on 5*, as the expectation is taken under the 
null hypothesis. Furthermore bi > 0, and the sensing budget in the observation model 
(2.2) implies that Y^^i=i — Therefore 

{2 ^ 
yI^^ [MAkeS}rl]\ 

- ^ sup mill <^ 5] 5] E0 [l{Ak = t}Tl] 



2 A sec 



. ies k=i 



,2 



— sup min bi 

be<:E?=if'.<™ ■^^^ ies 



We have now a relatively simple finite dimensional problem, where we seek to identify 
the vector b = . . . , &„)) maximizing a concave function. The solution of this problem 
obviously depends on the exact structure of C. Remarkably, for symmetric classes, the 
solution is extremely simple and characterized in the first part of the following lemma, 
proved in the Appendix. 



Lemma 3.1. Let C be a symmetric class. Let S = Usee ^ ■ 



sec ' 
1. 

ms 



en 



E ms 



1.^™+ I. Sec ' — • 

beKo ■Z-.i=i bi=m 



1 -k - x - ms 

ITI " W ' 



sup 

beR+iEILib.^™ 1^1 Sec ies 

and in both cases the solution is attained taking bi = ?7i/|S| for i G S and zero otherwise. 

We are now in place to prove the theorem: by putting together the likelihood ratio 
lower bound (3.10) and the above upper bound we get 

u?ms 1 

which is equivalent to 



/2SI 1 
V sm 2e 

concluding the proof. □ 
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Lower bounds for adaptive sensing in settings other than the one in this paper have 
been derived previously. For instance in Castro and Nowak (2008) a minimax characteri- 
zation of the fundamental performance limits of active learning for a binary classification 
problem was provided. Such results were made possible by bringing together approxi- 
mation results for smooth functional spaces and classical minimax bounding techniques 
(as in Tsybakov (2009)), modified to incorporate the sequential experimental design as- 
pect of the problem. In that approach the functional approximation results played the 
prominent role, and the stochastic part of the error had a much smaller contribution. 
Unfortunately this is not the case for the setting considered in the current paper and 
previously existing approaches were not adequate, prompting the novel approach in this 
paper. 

The proof of this theorem can be adapted for the other two risk definitions (3.3) and 
(3.4), and we can show that the risk behavior is qualitatively the same. These results are 
stated in the following proposition, proved in the Appendix. 

Proposition 3.1. Consider the setting of Theorem 3.1 and letO < e < 1. If R{^) < e/2 
or -R($) < e then the conclusion of Theorem 3.1 is still valid and the lower hound (3.5) 
holds. 

3.2. Tightness of the Detection Lower Bounds 

We now proceed to show that the lower bounds derived above are indeed tight, in the 
sense that there are adaptive sensing testing procedures which are able to nearly at- 
tain them. As we saw, for symmetric classes C, extra class structure docs not help. 
Therefore wc focus exclusively on the largest class of all the subsets of {1,. . . ,n] with 
cardinality s. In Haupt, Castro and Nowak (2011) a procedure called Distilled Sensing 
(DS) was introduced, and the authors proved that for the detection problem described 
above this procedure is able to asymptotically drive the risk to zero when ji > 4-\/ n/m 
and log log log n < s < n^^^ for some j3 £ (0,1). When comparing this result to the 
above lower bound we see that there is a huge gap, as we would expect the signal mag- 
nitude fi to scale essentially like \/2n/ (sm). However, it is important to note that DS is 
entirely agnostic about the sparsity level and possible signal magnitude. An alternative 
non-agnostic methodology can be derived using DS as a black-box, which nearly achieves 
the lower-bounds of the previous section. 

We begin by formally stating the performance results for the DS procedure. The follow- 
ing proposition is essentially the second part of Theorem III.l in Haupt, Castro and Nowak 
(2011). 

Proposition 3.2 (from Haupt, Castro and Nowak (2011)^). Assume logloglogn < 
s < n^^^ , for some /3 € (0, 1). Furthermore let ji > 4-\/ n/m. There is a sensing strategy 

^ The sparsity lower bound condition log log log n < s is not stated in the theorem in 
Haupt, Castro and Nowak (2011) for presentation reasons, and the discussion on the validity of the 
result for log log log n < s appears only on the last paragraph of Section VI. 
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Ads o,i^d a test Junction ^ds such that 

Ri^Ds) ^ , 



as n ^ oo. 



Note that this result is valid even if s w log log log n, meaning s is nearly asymptotically 
constant. This suggests the following modification: first randomly select n elements of 
{1, . . . , n} without replacement. Denote these hy £ = {Ei, . . . ,En}. Our sensing strategy 
will focus exclusively on the entries £ and ignore all the remaining ones. In other words, 
our observation model is now 

Yk=XE_^^+Tj:'Wk Vfce {1,2,...} , 

where Afc € {l,...,ri}. The sensing budget is, however, the same as in the original 
formulation 

oo 

fe=i 

In summary, we have exactly the same setting as before, but the extrinsic dimension 
n is now replaced by the smaller n. Now, provided we choose n large enough so that 
the conditions of Proposition 3.2 are met for this new setting then an improvement in 
performance is possible, yielding the following result. 



Proposition 3.3. Assume s > log log log n. Furthermore let fi > 
There is an adaptive sensing testing strategy such that 

^ , 



32?i log log log n 
sm 



as n OO. 



This result means that the statement of Corollary 3.1 is essentially tight, at least 
provided there are more than log log log n signal components under the alternative hy- 
pothesis. The constant in the bound is certainly not optimal, and the factor log log log n 
is (possibly) an artifact of the procedure. Closing the small gap between the upper and 
lower bounds is, however, still a direction for future research. 

Remark 3.2. The results above were derived assuming the non-zero signal components 
are positive. Qualitatively these results remain the same even if one allows both positive 
and negative components. A simple way to address this setting is to write a; as a; = 
— , where and x~ are sparse signal vectors with positive components (and the 
joint number of non-zero components is simply s). Now we can split the sensing budget 
into two equal parts, and make use of each one to test for the presence/absence or either 
signal. This approach yields the same asymptotic behavior, and will at most result in 
larger constants in the bounds. 
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Also note that, in principle, a procedure in the spirit of the one introduced in Chernoff 
(1959) could be used to construct an adaptive sensing and testing methodology. However, 
the method of analysis in that paper is not adequate to deal with our setting. Nevertheless 
such procedure seems to work extremely well based on a short simulation study we 
conducted, and its analytical characterization presents an interesting direction for future 
work. 

Proof of Proposition 3.3. The idea is simply to use the construction above, with 
n ~ 2k log log log n ^ gg(jg^ygg Qf ^}^g random entry selection step (the choice of £) the 
conditions of Proposition 3.2 might not always be satisfied. However this happens with 
very low probability. Define x € K" where Xi = x{Ei) i = 1, . . . ,n. Suppose x has s 
non-zero components, and let s be the number of non-zero components of x. Because of 
the sampling without replacement process, s is an hypergeometric random variable with 
mean 

(J 

E[s] — 71— — 2 log log log n , 
n 

and variance 

S / S \ Tl — Tl S 

V(s) = 71— I 1 ) < h— ~ 2 log log log ri . 

71 V 71/ 71 — 1 ?i 

This means that 

P(.? - E[S] < log log log 71 - E[S]) 
P(s-E[S] < - log log log 7i) 
P(|s-E[S]| > log log log 7l) 

V(g) 
(log log log7l)2 
2 

log log log n 

where we used Chcbyshcv's inequality on the sccond-to-last step. This means that, with 
probability at least 1 — 2/ log log log ii the conditions of Proposition 3.2 are fulfilled. For 
convenience define the event 17 = {s > log log log ri}. Since the detection risk is always 
bounded by 2 we have 

log log log 71 

therefore it suffices to show that, conditionally on $7, the risk of our procedure vanishes 
asymptotically. From Proposition 3.2 we know that if ^ > ^^hjra the detection risk 
converges to zero, which immediately yields 

/2nlog log log ri 
^ > 4W . 

V S771 



P(s < log log log ri) = 

< 
< 

< 



concluding the proof. 



□ 
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4. Signal Estimation 

In this section we consider the signal estimation problem, where the goal is to identify the 
support of the underlying signal x as accurately as possible. As in the detection case, we 
are interested in characterizing the minimum signal amplitude Xmin for which estimation 
is still possible. Clearly estimation is statistically more "difficult" than signal detection, 
and therefore the requirements on Xmin sltc more stringent in this case. Nevertheless 
we show that the dependence on the extrinsic dimension n does not play a role in the 
asymptotic performance bounds. 

For the same reasons as in the previous section we focus our attention on the signal 
model in (3.1). Our main goal is the estimation of the signal support set S = {i : Xi ^ 0}. 
In other words, our goal is to use adaptive sensing observations to construct an estimate 
S which is "close" to 5*. The metric of interest is the cardinality of the symmetric set 
difference 

d{s, s) = is-AS*! = |(^ n 5"=) u {S" ns)\ , 

where S"^ denotes the complement of S" in {1, . . . ,n}. Clearly d{S, S) is just the number 
of errors in the estimate S. In a similar spirit to that of the previous section, we want to 
determine how small can the signal magnitude fj, be so that 

maxEs[d(S',S')] < e , (4.1) 

where C is a class of sets, and e > is small. A different error metric which is also popular 
in the literature is P5(5' ^ S), that is, the probability one does not achieve exact support 
estimation. Clearly 

P5(^^^) <IEsK^,^)] , 

and therefore this is a less stringent metric. The tools developed in this paper pertain 
E5[c?(S', S)] and it is not clear if adaptive sensing lower bounds about Ps{S ^ S) can be 
derived easily using a similar approach. 

In addition, we will also consider a different support estimation risk function. Define 
the False Discovery Rate (FDR) and the Non-Discovery Rate (NDR) as 

FDR(5', S) = Es 

and 

NDR(S', S) = Es 

In the above definitions convention 0/0 = 0. Ideally we want both these quantities to be 
as small as possible, and so we can naturally define the risk 



l^\^l 
1^1 



\s\s\ 



i?FDR+NDR(S', S) = max |fDR(^, S) + NDR(S', 5)} 
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Obviously Es[d(5, S)] > FDR(5', S) + NDR(S', S) and these two measures of error can 
be dramatically different, therefore controlling the risk i?FDR+NDR(*S', S*) is significantly 
easier than controlling the absolute number of errors. 

Our original goal is to study lower bounds for the class C of all subsets of {1, . . . ,n} 
with cardinality s. For technical reasons this is a bit challenging, and to greatly simplify 
the analysis we consider a different setting that nonetheless captures the essence of the 
problem. Let C denote the class consisting of sets of cardinality s, s + 1 and s — 1. 
This class is only "slightly" bigger than C. We instead consider procedures that exhibit 
good performance when S (1 C , that is, estimation procedures that are "very mildly" 
adaptive to unknown sparsity. Generalization of the results to other classes of sets shall 
be considered in future work and is out of the scope of this paper. 

To aid in the presentation we introduce some new notation. Namely let 5,; = l{i G 5}. 
Similarly, for any estimator S let Si = l{i S S}. Note that the joint description of Si for 
all i is equivalent to S. For analysis purposes it is convenient to consider only symmetric 
procedures, meaning that for any S G C 

ViJ e S Ps(5, ^ 1) = PsiSj ^ 1) , (4.2) 

and 

i S Ps(5, ^ 0) = Ps{S, + 0) . (4.3) 

Although this might seem overly restrictive, it is indeed not the case. Any inference 
procedure can be "symmetrized" without increasing its maximal risk. In other words, 
given an estimator S we can construct another estimator S^v^™^ satisfying (4.2) and 
(4.3) and such that 

Es[rf(S'(P™),S')] < maxE54rf(S',5')] , 

for all sets S G C . The symmetrization is achieved by randomization. Let 
perm : {1, . . . , n} — > {1, . . . , n} be a permutation of {1, . . . , n} chosen uniformly at ran- 
dom among the set of n! possible permutations. Let S* be a particular estimator we 
are going to symmetrize. Proceed by exchanging the identity of the entries of x us- 
ing this permutation, or cquivalently by taking js^^'^'^™^ = ^perm-i(fc) for 3'iid use 
the estimator S on the collected data. Finally reverse the permutation, namely defining 
^|perin) _ S'pcrm(i) ; for ^ * G {1; ■ • ■ i^}- Usiug this coustructiou wc gct thc following 
lemma, proved in thc Appendix. 

Lemma 4.1. Let S he any adaptive sensing procedure. The random symmetrization ap- 
proach described in the paragraph above yields another adaptive sensing procedure S^p"™) 
such that, for any 5 G C 

y^eS P(5(^-) ^ 1) = ^..^ Y: T.^'s'iS,^!), 

I ImsI'' s'eC':[S'[=\s\jeS' 

V,^5 P(5(^-)^0) = - 1--- Yl T.^s'{S,^0). 

I" - l^ln|5|j s'ec:\s'\=\s\jts' 
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In addition, the following is also true: 

Es [diS^pern.) , 5)] < ^ ^ Eg, [d{S , S')] < max_ Es> [d{S, S')] . 



This ensures that without loss of generality we can consider only symmetric proce- 
dures. It is important to note that that this approach is valid only if the class C is 
invariant under permutations. Finally, for symmetric procedures the lower bounds we 
derive are are also applicable to measures of risk different than (4.1), such as the average 
estimation risk Ss'ec I'^S'Ml'S': S')]. 



4.1. Main Results - Estimation 



Theorem 4.1. Let C denote the class of all subsets of {1, . . . ,n} with cardinality s, 
s + 1 and s — 1. Let S = S{D) be an arbitrary adaptive sensing estimator, where D = 
{Yi,A,,T,}^,. If 



max E5[d(S',S')] < e , 



where < e < 1 then necessarily 



/ 2n /, , n — s , 1 

> W — log s + log — — + log — 
\ m \ n + 1 2e 



The proof of the theorem is presented at the end of this section. As before it is useful 
to look at the asymptotic behavior, and the case s n is particularly interesting. 

Corollary 4.1. Consider the setting of Theorem ^.l and assume s = o(n) as n ^ oo. 
Let Sn be an arbitrary estimation procedure for which 



lim maxE s [d{S n, S)] =0 . 

n— i-oo S£C' 



Necessarily 



> A 12 — (logs + uj„ 
m 



where cj„ is a sequence for which lim„_i.oo = oo. 

For the FDR+NDR risk we can use the same proof approach to obtain a much less 
restrictive bound on the signal magnitude. 

Corollary 4.2. Consider the setting of Theorem 4-1 and assume s ~ o{n). Let Sn be 
an arbitrary estimation procedure such that 

lim Rfdb.+ndr{S, S) = . 
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Necessarily 



> UJ„\I — 
m 



where Un is a sequence for which lini„_i.oo = oo. 

A sketch of the proof of this coroUary can be found in the Appendix. 

Proof of Theorem 4-1- The proof follows a similar approach as that of Theorem 3.1, 
and capitalizes heavily on the symmetry of the estimation procedure. In light of Lemma 4.1 
it suffices to consider symmetric procedures, that is, procedures that satisfy (4.2) and 
(4.3). Let 5* e C' be arbitrary and assume that 



Es[d(5,5)] < 



where < e < 1. Clearly 

Md{S, S)] 



lis 



\{S, ^ 0} 



El 



As we consider symmetric procedures we conclude that 

^leS Ps(S, ^ 1) < ^ 



and 



yi^s Vs{s^ ^ 0) < 



n~\S\ ■ 

For our purposes it is convenient to re-write the likelihood ratio (3.7) as 



LRs,s'(rf) 



fid;S') 

n 

n n 



fY^\AkXkiyk\ak,lk]S) 
fY^\A^Xkiyk\ak,lk;S') 



i—1 k:a^—i 

Now let S* € C be an arbitrary set of cardinality s, and define S**^'^ e C to be 

c(») _ / {*} Hie S 

^ SLi{i} iii(^S ' 
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in words, we either remove element i if z G 5*, or add it otherwise, meaning that 
S'AS''^*) = {i}. We proceed in a similar way as we did in the signal detection scenario. 
Let i e {1, . . . , n} be arbitrary. We conclude that 



eS Es [logL%s(.,] > - log (2P5 ^ 1) + 2Ps,., ^ 



and 



yi(^S Es [logLRs^sc] > - log [2¥s [S, ^ j + i^S, ^ 1 

We now take advantage of the symmetry of the estimator, to conclude that 

'2e 2e 



and 



Vze 5 Es [logL%s(o] > - log 



yiiS Es [logL%s(o] > - log 



2e 



n — s + 1 
2e 



s + 1 



(4.4) 



(4.5) 



Now that we have lower bounds for E5 [logLR_5 ^(i)] we need to evaluate this quantity 
in terms of /i. This is easily done by noting that for i G {1, . . . , n} 



Es [logLRs,s(.)] 



-k:Ak —i 



/n|A..rjyfc|Afc,r,;5) 
/F.|A„rJ^fe|Afc,rfe;5W) 



.k\Ai, —i 



Note that we cannot yet evaluate the above expression, as one cannot invoke the sensing 
budget constraint (2.2). This can be addressed by summing each of the above terms over 
i G {1, . . . , n}. On one hand 



^Es [logLRs_5(o] =Es 
1=1 

On the other hand 



E E 

.1—1 k:Ak—i 



Ec 



.k=l 



< 



rufi 



(4.6) 



^Es [logLRs.s,.)] = ^Es [logLRs^so] +E^'5 N^Rs.sw] 



i=l 



> —slog 



2e 2e 

s n — s + I 



(n - s) log 



2e 



2e 



n — s s + 1 

We can get a more insightful bound by reorganizing the various terms 

E" „ r, „ 1 ,1 , s(n — s + 1) , (71 — s)(s + l) 

Es [log LRs_s(„ ] > n log - + s log ^ ^ , ^ ^ + (n - s) log > ^ ^ ^ 



n+ 1 



n+ 1 



n\og — + s\o%s + (n - s) log(s + 1) + s log — h (n - s) log 

2e n + 1 n + 1 



> n log s + log 



?i + 1 



log 
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where the last inequahty follows by noting that log(s + 1) > logs and log(n — s + 1) > 
log(7i — s). Using this together with (4.6) concludes the proof. □ 

4.2. Tightness of the Estimation Lower Bounds 

Similarly to what happened in the detection setting the lower bounds derived for esti- 
mation are also tight, in the sense that there are inference procedures able to achieve 
them. In Malloy and Nowak (201 lb, a) a slightly different problem was considered, where 
each measurement had the same accuracy / precision and one desired to control the total 
number of errors in S. Their results were stated in term of conditions on the signal mag- 
nitude /.i that were necessary to ensure the risk converged to zero. In their setting there 
is no strict sensing budget, but instead only control over the expect precision budget 
used. In other words, the procedures in Malloy and Nowak (201 lb, a) do not always sat- 
isfy the sensing budget in equation (2.2), but instead satisfy an expected sensing budget 
constraint 



E 



< m 



Lk=l 

Such methods can be modified to ensure that the sensing budget (2.2) is fulfilled with 
increasingly high probability (as n grows) without altering their asymptotic performance 
behavior, and we can state the following result, proved in the Appendix. 

Proposition 4.1. Assume s + 1 < -r, — r^^-r-^- Let 



An 

I^L> -\/— (21og(s + l) + 51oglog2n) 

TO 



There is a sensing and estimation strategy yielding an estimator S such that 

maxEs[d(5,S')] ^ , 

as n ^ oo. 

This means that provided Xmin is of the order yj {n / m){\og{s) + log log n) we can 
ensure exact recovery of a sufficiently sparse signal support with probability approaching 
1. The proposition is proved in the Appendix. The constants in the above result arc rather 
loose, and can be made much tighter (see Malloy and Nowak (2011b)). The log log n term 
is an artifact of this method (which is parameter adaptive and agnostic about s). This 
term can be entirely avoided by considering another procedure, namely by executing 
in parallel n properly calibrated sequential likelihood ratio tests, which requires the 
knowledge of the sparsity level s. Such a procedure achieves precisely the bound in 
Corollary 4.1. Lower bounds for estimation have been derived under a different set of 
assumptions for the class of entry-wise sequential tests in Malloy and Nowak (2011a). In 
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contrast the results in the current paper pertain any adaptive sensing procedure (and 
not only entry- wise testing procedures). 

Control of the FDR+NDR risk was considered in Haupt, Castro and Nowak (2011) in 
the exact setting described in this paper, and the distilled sensing procedure in proposed 
there is able to achieve the bound in Corollary 4.2 provided log log log n < s < v}~^ for 
some < /3 < 1. Therefore the lower bounds on the FDR+NDR risk are also tight for a 
wide range of sparsity levels. 

4.3. Relation to Compressed Sensing 

The proof technique used in Theorem 4.1 also provides some important insights for the 
problem of adaptive compressive sensing. This setting is different than the one considered 
so far and the observation model is now of the form 

Y = Ax + W , 

where 1^ e M' denotes the observations, A e M'^" is the design/sensing matrix, x e M" 
is the unknown signal, and W G M' is Gaussian with zero mean an identity covariance 
matrix. The rows of A can be designed sequentially, and the fc*'' row (denoted by Ak-) can 
depend explicitly on {Y, , Aj.}^-~\ . Note that Wk is a normal random variable independent 
of {Yj, Aj.,Wj}^jZ\ and also independent of Ak.. This setting is particularly interesting 
when we impose some constrains on A, namely 

E[||A|||] <m, 

where || • \\f is the Frobenius matrix norm. Like (2.2), this sensing budget condition is 
very natural and the issue of noise is irrelevant without it. Each row A^- plays the role of 
the sensing action Ak in our original scenario, and ||^fc.||2 plays the role of the precision 
parameter F^ in (2.2). As before, we do not impose any restrictions on the total number 
of measurements which can be potentially infinite. We can show the following result 
using an approach similar to that of Theorem 4.1. 

Proposition 4.2. Consider the adaptive compressed sensing setting as described above, 
with observations Y = Ax+W , where W is Gaussian zero mean with identity covariance 
matrix and E [|jv4|||,] < m. Let HifJ.) C M" be the class of all vectors x with support in 
C (i.e. the support^ has cardinality s, s + 1 or s ^ 1) and the magnitude of the minimum 
non-zero entries greater or equal than fi. That is 

V.{p) = {a; e M" : supp(x) G C and min{|a;i| : Xi ^ Q} > ^l} . 

i 

Let D = {Y , A} and S{D) be an arbitrary estimator. Lf 

max E^[d(S',S')] < e (4.7) 



^Define supp(a:^) = {i : Xi ^ 0}. 
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where < e < 1 then necessarily 



M> W — log s + log 




n + 1 



n 



s 




) 



The proof of the proposition can be found in the Appendix. In Arias-Castro, Candes and Davenport 
(2011) the authors derived lower bounds for both support recovery and mean square er- 
ror risk for adaptive compressive sensing. In their setting I = to, and each row of the 
matrix A has expected norm at most 1. These two constrains imply the Frobenius norm 
constrain in Proposition 4.2. Theorem 2 in that paper states that the minimum signal 
amplitude .Tmin must be greater than y^n/m to ensure that support recovery is possible 
within the class of all possible s-sparse signals. In contrast, our result shows that lower 
bound is not entirely tight. Formally, if s ~ o{n) and 



as n — >■ oo. So, the above result improves the bound in Arias-Castro, Candes and Davenport 
(2011) by a log s factor. In light of the recent results in Haupt et al. (2012) it seems plau- 
sible that this is a necessary and sufficient term. However, a precise characterization of 
these limits remains an open problem. 

5. Conclusion 

In this paper we presented several lower bounds for detection and estimation of sparse 
signals using adaptive sensing. These results bridge a gap in our understanding of adap- 
tive sensing and show that methodologies recently proposed in the literature are nearly 
optimal. A very interesting insight is that, for signal detection, the sparsity structure 
is essentially irrelevant. The intuition being that for detection it suffices to identify one 
non-zero component, and cues provided by the structure are not too useful under adap- 
tive sensing scenarios. However, for signal estimation it is not clear if structure helps, 
which raises many interesting directions for future research. 
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n->oo SgC 



lim maxEs[d(S'„,5)] = 



we have necessarily 
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Appendix 



Proof of Lemma 3.1. : We begin by proving the first result. Let 

to/|S| if i e S 







otherwise 



1,. 



Begin by noticing that 



sup min > bi > min > 6- 
— ^ _ ^-^ — Sec ^ 



The proof proceeds by contradiction, and makes use of a probabilistic argument. Suppose 
there is a vector h* G such that X^ILi b* < fn and 



mm 

sec 



> 



(5.1) 



ies 



We show next that this in contradiction with the symmetry assumption. 
Let J be a uniform random variable with range S. Then 



(5.2) 



3 = 1 



Now construct another random variable K in a hierarchical fashion: first take S drawn 
uniformly over C, and given S take K drawn uniformly over S. Then clearly 



[bK\Sl 



E 



> E 



s 



IE': 

kes 

min — b 

sec s ^ 

kes 

min > b] 



sec 



kes 



> 



m 



(5.3) 



where the strict inequality follows from (5.1). To conclude the proof wc just need to 
notice that J and K have exactly the same distribution if the class C is symmetric. Let 
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fc G S be arbitrary. Then 



F{K = k) = E[l{/1 = k}] 

= E[E[1{X = fells']] 

-l{fc e S} 
= ip(fc e S) 

s 

1 s _ 1 

Therefore both J and K are uniformly distributed over S and so E[6j] = E[6|^]. This 
creates a contradiction between (5.2) and (5.3) invahdating the existence of vector b* , 
concluding the proof. 

For the second result note simply that 

^EE^^ = ^EEm{^^5} 

I ' Sec iss ' ' sec i=i 

n 

- E^^eEi{*^^} 

n 

i=i ' ' 

where the last step follows from the symmetry assumption. The result of the lemma is 
now immediate. 

□ 

Proof of Proposition 3.1. : If -/?($) < e/2 the result follows immediately from the 
simple fact that i?($) < 2E($). Therefore R{^) < e/2 implies that R{^) < e and we 
just apply the result of the theorem. For the second statement it is useful to look at S as 
a uniform random variable with range C. In the proof of Theorem 3.1 we showed that, 
for any S G C 

E0[logLR0,s|S] > - log (2P0(<I ^ 0) + 2Pi(<l ^ 1|S) 

where Pi denotes the probability measure under the alternative hypothesis. By taking 
the expectation on both sides we have 

E0[logLR0,s] > ^ log (2P0(<i. ^ 0) + 2Pi(<i. ^ 1\S) 
' ' sec 

To simplify the notation let po = P0(<I' ^ 0) and ps = Pi(l> ^ 1\S). The statement 
R{^) < e is equivalent to po + j^\J2secPs — ^- Accordingly define the constraint set 



Adaptive Sensing Lower Bounds for Sparse Signal Detection and Support Estimation 23 



V C Mi+ici as 



We have that 



'P = jpo, {Psjsec • -Po + X] - 



E0[logLR0_5.] > nhn S - j^r X! (2po + 2ps) 
I ' ' sec 



(5.4) 



= log^, (5.5) 

where the last step follows from a straightforward Lagrange multiplier argument, to 
conclude that the minimum is attained by taking po + ps ^ e for all S* € C 

The next step, similar to the proof of Theorem 3.1, is to solve sup^ E0[logLR0 g], 
where it is important to recall that 5* is random. Following the same approach as in the 
proof of the theorem yields 

supE0[logLR0^5] = — sup |7^X!X!^*' 

-4 ^ b6B+:Er=i6.=" secies 

where bi is defined in (3.12). The second result of Lemma 3.1 characterizes the solution 
of this optimization problem, and therefore 

LL^ms , 1 

Simple algebraic manipulation concludes the proof. □ 



Proof of Lemma 4-1- : To ease the notation let Cs denote the class of all subsets of 
{1, . . . ,n} with cardinality s. Let S € Cs and i e S' be fixed, but arbitrary. Note that the 
permutation perm maps this set to another set 5'^^°'™-' = perm(5') S Cs with the same 
cardinality. Furthermore, since the permutation is chosen uniformly over the set of all 
permutations this set is uniformly distributed over Cs, that is 

5(P°"") ^ Unif(C,) . 

In addition define the random variable J ~ perm(i). This is obviously uniformly dis- 
tributed over {1, . . . ,n}. More importantly, conditionally on 5'(pc"n)^ j jg uniformly dis- 
tributed over the set 5'(p'=''™). In other words, for arbitrary k e {1, . . . ,n} 

P(J = fc|5(P"™)) = P(perm(i) = fclS-^P™)) 

= P(perm-i(fc) = ilS-^P™)) 

( 1/s iffce5(p°™) 
1 otherwise 
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Therefore 



(perm) 



^1) 



pcrm(i) 



^1} 



= E 



E 



pcrm(i) 



^(pcrm) 

is, + 1) 



where the two last steps foUow from the distribution of S'^p'^''™) and perm(z). The proof 
of the lemma statement for i ^ S' is entirely analogous. Finally, the last result in the 
lemma follows trivially from the other two statements. □ 

Sketch proof of Corollary 4- 2- : The result in the corollary follows in the same man- 
ner as the result in Theorem 4.1, but noticing that for symmetric estimation procedures 
the requirements on the estimator Si for each i S {1, . . . , n} are much less stringent. In 
particular let S" G C be arbitrary and assume that 

-Rfdr+ndr(S', S) < e , 

where e > 0, which implies that both FDR and NDR are less than e. Now consider 
symmetric procedures and let a = P{Si ^ 0) for i ^ S and /3 = P{Si ^ 1) for i e 5. 
Clearly, the constraint in NDR implies that 



e > NDR(5', S)=E 



\S\S\ 



\s\i 

\S\ 



The constraint on FDR is a bit more difficult to analyze, due to the random denominator 
its definition. However, a very sloppy bound suffices, namely 



e > FDR(S', S) = 



'\s\s\ 




"|5n5=|' 


\s\ 




n 



in~\S\)a 



Therefore wc conclude that a < e suffices. Note that this is a very loose but nevertheless 
sufficient bound. The rest of the proof proceeds now in the same fashion as Theorem 4.1 
and Corollary 4.1. □ 



Proof of Proposition 4-2. ; The proof of this result mimics closely the proof of Theo- 
rem 4.1, with the necessary changes to account for the different sensing model. The first 
step is to reduce the class of signals under consideration. Clearly signals of the form (3.1) 
are also in the class Therefore 

max E4d{S,S)] > max Eg MIS', 5)] , 
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where the expectation on the right-hand-side is taken assuming x is of the form (3.1) 
with support S. Condition (4.7) therefore impUes that 

max Es[d{S,S)]<e , 

so, for the purpose of computing a lower bound it suffices to consider on the signals where 
all the non-zero components are valued fi. It is important to note that this subclass of 
signals might not correspond to the "hardest" signals to estimate, and no claim is made 
about this. However, this subclass seems to capture the essential aspects of the problem 
in light of the bounds derived. As the class of signals under consideration is the same 
as in Theorem 4.1 the only change in that proof stems from the different observation 
model, which in turn results in a different log-likelihood ratio. Notice that, as before, we 
can consider only symmetric procedures in the sense of Lemma 4.1. 

To aid in the presentation let Aij denote the entry in the ith row and jth column of 
the matrix A, and let Ai. and A.j denote respectively the ith row of and the jth columm 
of A. The log-likelihood ratio is therefore given by 



logLR5,s'(l^,A) = log 

e 

= ^log 



fc=i 



1 ^ 



k=l 



f{Y,A;S) 
.fiY,A;S') 

fY,iAjYk\Ak.;S) 
fY,\A,iYk\Ak.;S') 



Given this, the expected log-likelihood ratio can be computed quite easily as before, and 
we get 



Es [logLRs,s'(F,A)] = ^^Es 

fc=i 




(5.6) 



Now consider the sets 5^*^ as in the proof of Theorem 4.1. Since we have SA^*-'^ = {i} 
we get from Equation (5.6) 



Es [\ogLRs^s('>iY,A) 



-E 



.k=l 



yE[|iA.,||^] . 



(5.7) 



Prom this point on the proof proceeds in exactly the same fashion as that of Theorem 4.1. 
Begin by summing the terms (5.7) over i € {1, . . . , n} to get an upper bound on the 
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expected likelihood ratio 



1=1 



2 

i||_F 



^E[||A|||]<^. (5.8) 



Finally, the lower bounds on the log-likelihood ratio in (4.4) and (4.5) are not dependent 
on the nature of the likelihood ratio itself, but rather on the desired risk performance. 
So these bounds are valid in the compressed sensing setting as well. As in the proof of 
Theorem 4.1 using these lower bounds together with (5.8) concludes the proof. □ 



Proof of Proposition 4-1- '■ We begin by introducing an algorithm that achieves the 
desired performance bound. Algorithm 1 is described here for convenience of presentation 
and explained in detail in the next paragraphs. It is essentially the algorithm presented 
in Malloy and Nowak (2011a) for the case of Gaussian observation noise. 



Algorithm 1: Simple Distilled Sensing. 

Parameters: Number of steps I and per-measurement precision p 
Initialization: 

^ 0, i ^ 1, 5 
Ci •<— for i = 1 , . . . , n 
for j = 1,2,... 
for i •(— 1 to n do 
repeat 

fc 4- fc + 1 
Ci Ci + 1 

Measure = y/"'^ =x, + r-^Wk 
if p{k + 1) > m then 

I Terminate: Output S 
end 

until Ci = I or Yk < 0; 
if Ci = / and 1^ > then 

I S ^ SU{i} 
end 

end 

Terminate: Output S 



Sensing is performed coordinate-wise in a sequential way, until all the signal entries 
have been explored or the total sensing budget is exhausted. Note that all the measure- 
ments are made with the same precision p. For each signal entry i the algorithm performs 
at most / measurements. If any of these measurements is negative then entry i is deemed 
not to belong to the support estimate S. If all the I measurements are non-negative then 
entry i is deemed to belong to the support estimate. For convenience we identify the 
measurements of entry i by where j £ {!,...,/}. 

In a sense the algorithm is a very crude version of a sequential likelihood ratio test. 
Given that we are interested in the general rates of error decay we do not optimize the 
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algorithm parameters for performance and instead make crude choices that are sufficient 
to prove the result. In particular we take p = m/(4n) and I = log2 n. 

The proof goes by showing first that, with high probability, the algorithm terminates 
before reaching the total sensing budget. Therefore, for the analysis we consider a modi- 
fication of the algorithm were termination upon the event p{k+ 1) > m is removed. Note 
that the number of measurements collected for entry i is simply c^. These are independent 
random variables. The total number of measurements collected is Yl^=i ^i- Note that for 
all i we have < < Z. Furthermore, note that for i ^ S the corresponding measure- 
ments Yf are zero mean normal random variables, which means thatPs(r/^'^ < 0) = 1/2. 
Therefore Ci corresponds to a truncated geometric random variable: 

r (1/2)- ifa; = l,...,/-l 
iiS, Fs{c^ = x)=^ (l/2)'-i ifa; = / 

[ otherwise 

Since these are truncated geometric random variables it is clear that Es{ci) < 2 and 
Vs(ci) < 2. Now, Bernstein's inequality (as stated in Wasserman (2006), page 9) tells us 
immediately that 



(^c,-2(n-s)>tj <expf-^- 



',2{n- s) + lt/3^ 

Taking t ^ n — s, and noting that X)r=i ^ si + '^i^g Ci we conclude that 

(5.9) 





Now, provided s < n/{l — i), we conclude that the total number of measurements of the 
algorithm is smaller than 4n with probability approaching 1 as n grows, that is 

(5.10) 

Therefore the total amount of precision used is under Anp with high probability. There- 
fore, for the choice p = m/ {An) the total amount of precision used is less than m with 
high probability. In other words, 

/ 1 n — s \ 

Ps(p(fc + 1) >to) <exp(^--^^j . (5.11) 

This result ensures the modified algorithm is essentially the same as the original one, 
as in the latter we will rarely encounter the event p(fc + 1) > m (this statement will 
be made precise later). Therefore wc can proceed by analyzing the performance of the 
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modified algorithm. This can be done in a entry-wise fashion and we must consider the 
cases i G S and i ^ S. For i ^ S note that 



Ps(* e S) 

For i € S wc have 



[niy^''' > 0}) = 



I 



where the result follows from a Gaussian tail and the union (of events) bounds. These 
two results together give 

Es[d{S,S)] = Y.^sii e S) + Y,^s{i ^ S) 



< 



< 











n — s 


si 


(-^) 


1 


71 — S 


h ^cxp( 




2 log s 


2' 




2 



Now, given the choice / = logj n we conclude that the first term in the above summation 
converges to as n — > oo, and the second term also converges to zero provided 

— p/i^ — 2 log s — 2 log I oo 



as n — > oo. Clearly if /J. > y ^(2 log s + 5 loglog2 n) this condition is satisfied. To con- 
clude the proof all that remains to be done is to take Equation 5.11 into account to 
conclude that, for the original algorithm 

S)] < Es[d{S, 5)|p(fc + 1) < m] + Es[d{S, SMk + 1) > m]Fs{p{k + 1) > m) 

< Es[d{S,S)\pik+l) < m]+nPs{pik+l) > m) 

n-s 1 ( PI? -2\ogs-2\ogl\ ( 1 

< ^ o cxp -f ?iexp --- 



2 V 2 J V22 + log^ n/3 

Clearly, under the condition s < n/{l~Z) all the terms above converge to zero as ri — > cx), 
concluding the proof. □ 
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